Search CORE

90 research outputs found

Means and covariance functions for geostatistical compositional data: an axiomatic approach

Author: A Kolmogorov
D Billheimer
Denis Allard
DJJ Walvoort
G Mateu-Figueras
H Wackernagel
J Aczél
J Aczél
J Aitchison
J Aitchison
J Aitchison
J Matkowski
JD Helterbrand
JJ Egozcue
JJ Egozcue
JL Scealy
JP Chilès
KL Arrow
N Cressie
P Bogaert
R Tolosana-Delgado
RF Shurtz
RY Liu
Thierry Marchant
V Pawlowksy-Glahn
V Pawlowksy-Glahn
WE Miller
WE Sharp
Y Vardi
Y Zuo
Publication venue
Publication date: 23/10/2017
Field of study

This work focuses on the characterization of the central tendency of a sample of compositional data. It provides new results about theoretical properties of means and covariance functions for compositional data, with an axiomatic perspective. Original results that shed new light on the geostatistical modeling of compositional data are presented. As a first result, it is shown that the weighted arithmetic mean is the only central tendency characteristic satisfying a small set of axioms, namely continuity, reflexivity and marginal stability. Moreover, this set of axioms also implies that the weights must be identical for all parts of the composition. This result has deep consequences on the spatial multivariate covariance modeling of compositional data. In a geostatistical setting, it is shown as a second result that the proportional model of covariance functions (i.e., the product of a covariance matrix and a single correlation function) is the only model that provides identical kriging weights for all components of the compositional data. As a consequence of these two results, the proportional model of covariance function is the only covariance model compatible with reflexivity and marginal stability

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

HAL Descartes

Estimating the evidence of selection and the reliability of inference in unigenic evolution

Abstract Background Unigenic evolution is a large-scale mutagenesis experiment used to identify residues that are potentially important for protein function. Both currently-used methods for the analysis of unigenic evolution data analyze 'windows' of contiguous sites, a strategy that increases statistical power but incorrectly assumes that functionally-critical sites are contiguous. In addition, both methods require the questionable assumption of asymptotically-large sample size due to the presumption of approximate normality. Results We develop a novel approach, termed the Evidence of Selection (EoS), removing the assumption that functionally important sites are adjacent in sequence and and explicitly modelling the effects of limited sample-size. Precise statistical derivations show that the EoS score can be easily interpreted as an expected log-odds-ratio between two competing hypotheses, namely, the hypothetical presence or absence of functional selection for a given site. Using the EoS score, we then develop selection criteria by which functionally-important yet non-adjacent sites can be identified. An approximate power analysis is also developed to estimate the reliability of inference given the data. We validate and demonstrate the the practical utility of our method by analysis of the homing endonuclease <monospace>I-Bmol</monospace>, comparing our predictions with the results of existing methods. Conclusions Our method is able to assess both the evidence of selection at individual amino acid sites and estimate the reliability of those inferences. Experimental validation with <monospace>I-Bmol</monospace> proves its utility to identify functionally-important residues of poorly characterized proteins, demonstrating increased sensitivity over previous methods without loss of specificity. With the ability to guide the selection of precise experimental mutagenesis conditions, our method helps make unigenic analysis a more broadly applicable technique with which to probe protein function. Availability Software to compute, plot, and summarize EoS data is available as an open-source package called 'unigenic' for the 'R' programming language at <url>http://www.fernandes.org/txp/article/13/an-analytical-framework-for-unigenic-evolution</url>.</p

Scholarship@Western

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing

Author: Alan J. Mileham
AM Bolger
B Paten
C Xu
D Aird
DYC Brandt
GL Owens
Gregor Gorjanc
H Li
H Li
J Aitchison
JD Wall
JJ Egozcue
JM Hickey
John M. Hickey
M Costello
MA DePristo
Mara Battagin
Martin Johnsson
MG Ross
P Danecek
R Ros-Freixedes
R Ros-Freixedes
Roger Ros-Freixedes
RW Davies
S Gonen
S Hoecke Van den
Steve D. Rounsley
TS Korneliussen
X Chen
Y Benjamini
Y Guo
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. Results We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. Conclusions We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Edinburgh Research Explorer

Repositori Obert UdL

water chemistry are new challenges possible from coda compositional data analysis point of view

Author: A Buccianti
A Buccianti
A Buccianti
A Kleidon
A Menció
A. Buccianti
A. Buccianti
Andrew J. E. Seely
Chongbin Zhao
CJ Allegre
D Billheimer
DDR Owen
F Chayes
Frits Agterberg
G Nicolis
Jenna L. Shelton
JG Holden
JJ Egozcue
JJ Egozcue
L Dai
M Mitzenmacher
M Scheffer
M Scheffer
MA Engle
MA Engle
MB Cardenas
MB Cardenas
MMJW Rooij van
NI Fisher
RM Garrels
S Thió-Henestrosa
SL Shvartsev
SL Shvartsev
V Dakos
V Pawlowsky-Glahn
V Pawlowsky-Glahn
Z Liu
Publication venue
Publication date: 25/06/2018
Field of study

John Aitchison died in December 2016 leaving behind an important inheritance: to continue to explore the fascinating world of compositional data. However, notwithstanding the progress that we have made in this field of investigation and the diffusion of the CoDA theory in different researches, a lot of work has still to be done, particularly in geochemistry. In fact most of the papers published in international journals that manage compositional data ignore their nature and their consequent peculiar statistical properties. On the other hand, when CoDA principles are applied, several efforts are often made to continue to consider the log-ratio transformed variables, for example the centered log-ratio ones, as the original ones, demonstrating a sort of resistance to thinking in relative terms. This appears to be a very strange behavior since geochemists are used to ratios and their analysis is the base of the experimental calibration when standards are evolved to set the instruments. In this chapter some challenges are presented by exploring water chemistry data with the aim to invite people to capture the essence of thinking in a relative and multivariate way since this is the path to obtain a description of natural processes as complete as possible

Crossref

Open Access Repository

Automated interpretation of 3D laserscanned point clouds for plant organ segmentation

Author: A Paproki
AK Mahlein
Anne-Katrin Mahlein
C Thurau
C Thurau
CA Berdugo
CD Manning
CM Bishop
DH Lorenz
E Amigó
F Fiorani
I Vajda
J Aitchison
J Aitchison
J Aitchison
J Dupuis
JA Martín-Ferníndez
JJ Egozcue
Kristian Kersting
M Wahabzada
Mirwaes Wahabzada
NV Vapnik
R Sozzani
RB Rusu
RPdM Frasson
RT Furbank
S Paulus
S Paulus
S Paulus
Stefan Paulus
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Normalization and microbial differential abundance strategies depend upon data characteristics

Author: A Brewer
A Reynolds
AD Fernandes
BD Wagner
BS Kim
C Lozupone
C Lozupone
C Soneson
CA Lozupone
CA Lozupone
CA Lozupone
CL Lauber
D Gevers
D Yu
DA Carcer
DM Witten
EK Costello
F Rapaport
I Holmes
J Aitchison
J Friedman
J Jernvall
J Qin
JA Martin-Fernandez
JC Gower
JE Mosimann
JG Caporaso
JG Caporaso
JG Caporaso
JG Caporaso
JH Bullard
JJ Egozcue
JN Paulson
JN Paulson
JR Bray
JR White
K Kennedy
K Pearson
LA David
M Greenacre
M Pop
MA Dillies
MC Horner-Devine
MD Robinson
MD Robinson
MJ Anderson
MJ Anderson
N Fierer
NJ Gotelli
P Piombino
PD Schloss
PI Costea
PJ Turnbaugh
PL Auer
PS La Rosa
RC Team
RE Ley
RK Colwell
RL Rodriguez
S Anders
S Mandal
SJ Salter
SR Connolly
VK Ridaura
Y Benjamini
Y Vazquez-Baeza
YB Cheung
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Source patterns of potentially toxic elements (PTEs) and mining activity contamination level in soils of Taltal city (northern Chile)

Author: A Kabata-Pendias
A Koschinsky
A Lima
A Petrik
A Petrik
A Violante
A Wali
AA Odewande
AMG Figueiredo
Annamaria Lima
Antonio Panico
Arturo Reyes
Benedetto De Vivo
C Reimann
C Reimann
C Reimann
CE Azevedo-Silva
D Cicchella
D Linden
DS Manta
EL Ander
ER Mouta
ESRI (Environmental Systems Research Institute)
F Ajmone-Marsan
F Luz
G Müller
G Müller
GH Guo
HI Ezeigbo
HS Lim
J Aitchison
J Bundschuh
J Han
J Sucharovà
JJ Egozcue
K Hron
K Naicker
L Hakanson
Linda Daniele
LM Shuman
M Biasioli
M Thiombane
M Thiombane
M Thiombane
Marcello Di Bonito
Matar Thiombane
MT Guillén
P Filzmoser
P Filzmoser
P Suchan
P Tume
Q Cheng
Q Cheng
Q Cheng
R Chester
R Salminen
R Zuo
R Zuo
RS Stahl
S Albanese
S Albanese
S Maas
S Wu
T Batjargal
T Prapamontol
T Tarvainen
V Pawlowsky-Glahn
V Pawlowsky-Glahn
X Li
XS Luo
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Mining activities are among the main sources of potentially toxic elements (PTEs) in the environment which constitute a real concern worldwide, especially in developing countries. These activities have been carried out for more than a century in Chile, South America, where, as evidence of incorrect waste disposal practices, several abandoned mining waste deposits were left behind. This study aimed to understand multi-elements geochemistry, source patterns and mobility of PTEs in soils of the Taltal urban area (northern Chile). Topsoil samples (n = 125) were collected in the urban area of Taltal city (6 km2) where physicochemical properties (redox potential, electric conductivity and pH) as well as chemical concentrations for 35 elements were determined by inductively coupled plasma optical emission spectrometer. Data were treated following a robust workflow, which included factor analysis (based on ilr-transformed data), a new robust compositional contamination index (RCCI), and fractal/multi-fractal interpolation in GIS environment. This approach allowed to generate significant elemental associations, identifying pool of elements related either to the geological background, pedogenic processes accompanying soil formation or to anthropogenic activities. In particular, the study eventually focused on a pool of 6 PTEs (As, Cd, Cr, Cu, Pb, and Zn), their spatial distribution in the Taltal city, and the potential sources and mechanisms controlling their concentrations. Results showed generally low baseline values of PTEs in most sites of the surveyed area. On a smaller number of sites, however, higher values concentrations of As, Cd, Cu, Zn and Pb were found. These corresponded to very high RCCI contamination level and were correlated to potential anthropogenic sources, such as the abandoned mining waste deposits in the north-eastern part of the Taltal city. This study highlighted new and significant insight on the contamination levels of Taltal city, and its links with anthropogenic activities. Further research is considered to be crucial to extend this assessment to the entire region. This would provide a comprehensive overview and vital information for the development of intervention limits and guide environmental legislation for these pollutants in Chilean soils

Crossref

Nottingham Trent Institutional Repository (IRep)

Repositorio Académico de la Universidad de Chile